Goto

Collaborating Authors

 empirical comparison




226d1f15ecd35f784d2a20c3ecf56d7f-Reviews.html

Neural Information Processing Systems

R2/R3: 'To better judge the performance of the proposed method, it would be useful to include comparisons against existing approaches to the problem such as [Meeds et al., NIPS 2007] and [Bittdorf et al., NIPS 2012]' We have meanwhile conducted an experimental comparison to the LP approach of Bittdorf et al. in the separable setting with binary T. We have found that our approach is more robust to noise, and we plan to add these results in the final version. The model of Meeds et al. involves two binary factors in a three-factor factorization and is hence different from our factorization model. A comparison can still be performed when running our approach in a two-step manner. Provided that code can be obtained from Meeds et al, such a comparison would be included in the final version. Please note that the paper already contains a comparison to two methods based on alternating optimization (a standard approach to NMF), adapted to our specific factorization problem.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper shows that the skip-gram model of Mikolov et al, when trained with their negative sampling approach can be understood as a weighted matrix factorization of a word-context matrix with cells weighted by point wise mutual information (PMI), which has long been empirically known to be a useful way of constructing word-context matrices for learning semantic representations of words. This is an important result since it provides a link between two (apparently) very different methods for constructing word embeddings that empirically performed well, but seemed on the surface to have nothing to do with each other. Using this insight, the authors then propose a new matrix construction and finds it performs very well on standard tasks. The paper is mostly admirably clear (see below for a few suggestions on where citations could be added to make the relevant related work clear) and very nice contribution to have to explain what is going on in these neural language model embedding models.




the paper be accepted

Neural Information Processing Systems

Regarding our proof techniques, the proof in Thm. 1 for NTK with two layers and bias borrows techniques from [6]. Our proof technique for deep networks uses the algebra of RKHSs and is therefore novel in this context. Thm. 2 derives bounds that result from the relation between the Fourier expansion of the Laplace kernel in NTK (established in Thm. 4) and identifying the spaces fixed under the appropriate integral transform. "why they need additional parameters a, b, c." We note that analogously NTK becomes sharper for deeper networks.




Review for NeurIPS paper: Kernel Methods Through the Roof: Handling Billions of Points Efficiently

Neural Information Processing Systems

There is a consensus among the knowledgeable reviewers that this work makes a significant contribution to the kernel community. It integrates several practical techniques and engineering efforts to further improve the scalability of the kernel machines. The techniques proposed in this work will permit the use of several GPUs in training kernel-based models with huge amount of data, which I also see as a significant contribution. Regardless of the overall score, I think this paper deserves an oral because it shows how to take full advantage of GPU hardware when solving learning problems with kernels methods. Scalability is one of the long-standing problems in kernel machines but has been largely neglected and under-appreciated in the past few years.